The following is an investigation into lego brick colours. The idea was inspired by Hanna Yan Han’s article here. The data for this is from rebrickable. From which I created a local MySQL DB to access this data (DB set up notes are here).
This project was undertaken based on an inspiration from Hanna Yan Han, it utilised the local database I’d created, and it aligned with my life long interest in Lego.
From focusing on the colour of Lego Bricks:
Looking at sets between 1958 and 1967
The Lime bricks from the 80’s that stood out
Data trends in teh 2000’s
# Import Library
library(RMySQL)
library(plotwidgets)
library(ggplot2)
library(reshape2)
# library(extrafont)
# DB connection
# source("/Users/steveuser/Documents/Repositories/Lego_data/getDBConn.r") # Mac path
source("C:/Users/SteveLocal/Documents/Repos/Lego_data/getDBConn.r") # Windows path
# Custom plot theme
custom_theme <- function() {
font <- "mono"
theme_light() +
theme(
legend.position = "none",
plot.title = element_text(family = font,
size = 16,
face = "bold",
hjust = 0,
vjust = 2),
plot.subtitle = element_text(family = font,
size = 12),
plot.caption = element_text(family = font,
size = 9,
hjust = 1),
axis.title = element_text(family = font,
size = 10),
axis.text = element_text(family = font,
size = 9),
axis.text.x = element_text(margin = margin(5, b = 10))
)
}
theme_set(custom_theme())
The above code chunk has default settings
The data for all the colour analysis can be found in the ‘colors’
table in the Lego DB. For plotting: The individual colours values for
both RGB and HSL were pull out into their own columns. Also, a mapping
list was created to map lego brick colour to lego brick colour name
(makes life easy when manually setting the colour scale
scale_color_manual()).
# Note: 'conn' was generated from a script to hide the user details
# dbGetQuery is from the 'RMySQL' library which is loaded in the above script
data_from_mysql <- dbGetQuery(conn, "SELECT * FROM colors")
# Split colour values out for plotting
data_from_mysql$rgb_hex <- paste0("#", data_from_mysql$rgb)
# HSL values
data_from_mysql$H <- col2hsl(data_from_mysql$rgb_hex)[1,]
data_from_mysql$S <- col2hsl(data_from_mysql$rgb_hex)[2,]
data_from_mysql$L <- col2hsl(data_from_mysql$rgb_hex)[3,]
# RGB values
data_from_mysql$R <- col2rgb(data_from_mysql$rgb_hex)[1,]
data_from_mysql$G <- col2rgb(data_from_mysql$rgb_hex)[2,]
data_from_mysql$B <- col2rgb(data_from_mysql$rgb_hex)[3,]
# Mapping Lego Brick colour to name for manual scale plotting
fill_colour_value_mapping_byName <- list()
for(i in 1:nrow(data_from_mysql)){
fill_colour_value_mapping_byName[i] <- data_from_mysql$rgb_hex[i]
}
names(fill_colour_value_mapping_byName) <- data_from_mysql$name
# Addition non-lego brick colours for ease of plotting
fill_colour_value_mapping_byName <- append(fill_colour_value_mapping_byName, c(None = NA))
The following explores Lego brick colours as well as the addition data incorporated in the ‘colors’ database table, which includes:
num_parts)num_sets)y1)y2)The following 2 charts show all the Lego brick colours from 1949 to 2015 (when the data was downloaded) based on Hue, Saturation and Lightness (HSL).
The final chart shows the colours that were used per year between 1949 and 2015.
# Saturation vs Lightness, Fill <- Hue
ggplot() +
geom_point(data = data_from_mysql,
mapping = aes(x = L, y = S, fill = name),
shape = 21,
stroke = 0.5,
size = 5,
colour = "black",
alpha = 1.0) +
scale_fill_manual(values = fill_colour_value_mapping_byName) +
scale_x_continuous(breaks = c(0, 1)) +
scale_y_continuous(breaks = c(0, 1)) +
labs(title = "All Lego brick colours: Lightness vs Saturation",
subtitle = "273 different colours from 1949 to 2025",
x = "Lightness",
y = "Saturation") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
All the Lego brick colours. Saturation vs Lightness, fill is brick colour (not hue).
The above plot shows all the different coloured Lego bricks between 1949 and 2015. This indicates a higher density of colours with high (1) saturation.
# Radial: Hue vs Saturation, fill <- Hue
ggplot(data = data_from_mysql,
mapping = aes(x = H, y = S, fill = name)) +
geom_point(shape = 21,
stroke = 0.5,
size = 5,
colour = "black",
alpha = 1.0) +
coord_polar(theta = "x") +
scale_fill_manual(values = fill_colour_value_mapping_byName) +
scale_x_continuous(breaks = c(0, 360)) +
scale_y_continuous(breaks = seq(0, 1, 0.2)) +
labs(title = "Lego brick colour: Saturation vs Hue",
subtitle = "Red areas highlight sparsely populated brick colours",
x = "Hue",
y = "Saturation") +
geom_rect(aes(xmin = 85, xmax = 180,
ymin = 0.65, ymax = 1,
color = "black", fill = "None", alpha = 0.5)) +
geom_rect(aes(xmin = 235, xmax = 335,
ymin = 0.5, ymax = 1,
color = "black", fill = "None", alpha = 0.5))
All the Lego brick colours. Saturation vs Hue, fill is brick colour (not hue).
When you look at the same colours with Hue on a polar axis there seems to be 2 areas that have a lower density of brick colours. These are highlighted by the read rectangles. This implies there are low numbers of bricks in the purple/pink and green colours with saturation above 0.5.
# Dataframe to capture each year a colour was used
data_colour_by_year <- data.frame(
year = seq(min(data_from_mysql$y1, na.rm = TRUE),
max(data_from_mysql$y2, na.rm = TRUE),
1))
for (i in 1:nrow(data_from_mysql[order(data_from_mysql$S), ])){
name_label = data_from_mysql$name[i]
data_colour_by_year[[name_label]] <- as.numeric(
data_colour_by_year$year >= data_from_mysql$y1[i] &
data_colour_by_year$year <= data_from_mysql$y2[i])
}
data_colour_by_year <- melt(data = data_colour_by_year,
id.vars = "year",
variable.name = "name",
value.name = "present")
data_colour_by_year$present[is.na(data_colour_by_year$present)] <- 0
data_colour_by_year$H <- NA
for (i in 1:nrow(data_from_mysql)){
data_colour_by_year$H[which(data_from_mysql$name[i] == data_colour_by_year$name)] <- data_from_mysql$H[i]
}
ggplot(data = data_colour_by_year,
mapping = aes(x = year, y = present, fill = reorder(name, H))) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual(values = fill_colour_value_mapping_byName) +
labs(title = "Lego brick colour: Colour count by year",
subtitle = "",
x = "Year",
y = "Colour count")
Number of Lego brick colours used per year. Fill is brick colour
The above plot shows the number of colours used per year, it can be observed from this that:
The above leaves me with the following questions:
The following section explores the aforementioned questions.
Previous analysis showed that between 1958 and 1967 there was a plateau in the number of colours used in Lego sets (median 18 colours). The following section explores what lego sets were available in this time period and what colours made up the bricks in these sets.
The following 2 plots show the themes available between 1958 and 1967, detailing the number of bricks by colour in each theme, as well as the number of sets per theme.
data_sixty_brickCol <- dbGetQuery(conn,
"SELECT themes.name as 'themes_name',
colors.name as 'colour_name',
COUNT(parts.part_num) as 'part_count'
FROM sets
INNER JOIN themes ON sets.theme_id = themes.id
INNER JOIN inventories ON sets.set_num = inventories.set_num
INNER JOIN inventory_parts ON inventories.id = inventory_parts.inventory_id
INNER JOIN colors ON inventory_parts.color_id = colors.id
INNER JOIN parts ON inventory_parts.part_num = parts.part_num
WHERE (sets.year >= 1958 AND sets.year <=1967)
GROUP BY themes_name, colour_name
ORDER BY themes_name;")
data_sixty_setTheme <- dbGetQuery(conn,
"SELECT themes.name as 'themes_name',
COUNT(sets.set_num) as 'sets_count'
FROM sets
INNER JOIN themes ON sets.theme_id = themes.id
WHERE (sets.year >= 1958 AND sets.year <=1967)
GROUP BY themes_name
ORDER BY themes_name;")
data_sixty_setTheme$sets_label <- NA
for (i in 1:nrow(data_sixty_setTheme)) {
data_sixty_setTheme$sets_label[i] <- paste(data_sixty_setTheme$sets_count[i], " sets")
}
# head(data_sixty_brickCol, n = 10)
# head(data_sixty_setTheme, n = 10)
#
ggplot(data = data_sixty_brickCol,
mapping = aes(x = themes_name, y = part_count, fill = colour_name)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual(values = fill_colour_value_mapping_byName) +
labs(title = "Lego brick colour: 1958 to 1967, themes",
subtitle = "Number of bricks by colour in each theme",
x = "Theme name",
y = "Brick count") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
Number of bricks in themes from 1958 to 1967. Fill is brick colour.
ggplot(data = data_sixty_brickCol,
mapping = aes(x = 5, y = part_count, fill = colour_name)) +
geom_bar(stat = "identity", position = "fill", colour = "black", linewidth = 0.1) +
geom_text(data = data_sixty_setTheme, mapping = aes(x = -Inf, y = -Inf, fill = NA, label = sets_label)) +
coord_polar(theta = "y") +
scale_x_continuous(limits = c(3, NA)) +
scale_fill_manual(values = fill_colour_value_mapping_byName) +
facet_wrap(~ themes_name) +
labs(title = "Lego brick colour: 1958 to 1967. themes",
subtitle = "The number of sets within a theme are labeled for each plot",
x = "Theme name",
y = "Colour of bricks (% of brick in theme)") +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
Number of bricks in themes from 1958 to 1967. Fill is brick colour, number of sets per theme labelled in centre of each plot.
It can be seen in the above that most the themes have the same colours with the exception of:
Database SetsHO 1:87 VehiclesModulexAlso, themes like Database Sets &
Samsonite in this time period are less related to the
content of the sets, but logistical categorisation i.e. according to brickipedia
Samsonite was US manufacturer of Lego.
data_sixty_setsParts <- dbGetQuery(conn,
"SELECT sets.name as 'sets_name',
COUNT(parts.part_num) as 'part_count'
FROM sets
INNER JOIN themes ON sets.theme_id = themes.id
INNER JOIN inventories ON sets.set_num = inventories.set_num
INNER JOIN inventory_parts ON inventories.id = inventory_parts.inventory_id
INNER JOIN colors ON inventory_parts.color_id = colors.id
INNER JOIN parts ON inventory_parts.part_num = parts.part_num
WHERE (sets.year >= 1958 AND sets.year <=1967)
GROUP BY sets_name
ORDER BY part_count DESC;")
data_sixty_setsParts_col <- dbGetQuery(conn,
"SELECT sets.name as 'sets_name',
colors.name as 'colour_name',
COUNT(parts.part_num) as 'part_count'
FROM sets
INNER JOIN themes ON sets.theme_id = themes.id
INNER JOIN inventories ON sets.set_num = inventories.set_num
INNER JOIN inventory_parts ON inventories.id = inventory_parts.inventory_id
INNER JOIN colors ON inventory_parts.color_id = colors.id
INNER JOIN parts ON inventory_parts.part_num = parts.part_num
WHERE (sets.year >= 1958 AND sets.year <=1967)
GROUP BY sets_name, colour_name
ORDER BY part_count DESC;")
data_sixty_setsParts$part_label <- NA
for (i in 1:nrow(data_sixty_setsParts)) {
data_sixty_setsParts$part_label[i] <- paste(data_sixty_setsParts$part_count[i], " parts")
}
# head(data_sixty_setsParts, n = 14)
# head(data_sixty_setsParts_col, n = 10)
# Select top sets
top_sets_index <- logical(nrow(data_sixty_setsParts_col))
for (i in 1:16) {
top_sets_index <- top_sets_index | (data_sixty_setsParts_col$sets_name == data_sixty_setsParts$sets_name[i])
}
ggplot(data = data_sixty_setsParts_col[which(top_sets_index), ],
mapping = aes(x = 5, y = part_count, fill = colour_name)) +
geom_bar(stat = "identity", position = "fill", colour = "black", linewidth = 0.1) +
geom_text(data = data_sixty_setsParts[1:16, ], mapping = aes(x = -Inf, y = -Inf, fill = NA, label = part_label), size = 3) +
coord_polar(theta = "y") +
scale_x_continuous(limits = c(3, NA)) +
scale_fill_manual(values = fill_colour_value_mapping_byName) +
facet_wrap(~ sets_name) +
labs(title = "Lego brick colour: 1958 to 1967, sets",
subtitle = "The number of parts within a theme are labeled for each plot",
x = "Top 16 Lego sets by number of parts",
y = "Colour of bricks (% of brick in theme)") +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.text = element_text(size = 6))
% of bricks by colour in theme from 1958 to 1967. The number of parts per set is labelled in the centre of each plot.
Addressing the original question of what makes up
the bricks between 1958 and 1967, the majority of parts are in an
‘unused’ category (Unused Modulex parts sold by LEGO &
Unused Parts Database Set - Pre-1965, with 1863 & 200
parts respectively). It is assumed that these parts were not
commercially available, but were held in the database for completeness.
So removing these 2 sets from the analysis, the top 3 sets with the most
parts are:
Deluxe Building Set with 144 partsStarter Train Set with Motor with 143 partsBasic Building Set with Train with 116 partsWith a quick enquiry (see below) is can be seen that there were 2 colours that were introduced and ended in the 80’s:
# Find colours only in the 80's
head(data_from_mysql$name[data_from_mysql$y1 > 1980 &
data_from_mysql$y2 < 1990 &
!is.na(data_from_mysql$y1)], n = 10)
## [1] "Fabuland Orange" "Fabuland Lime"
Filtering on ‘Fabulous Lime’ shows 15
parts:
data_lime_bricks <- dbGetQuery(conn,
"SELECT sets.set_num as 'sets_num',
sets.name as 'sets_name',
colors.name as 'color_name',
parts.name as 'part_name'
FROM sets
INNER JOIN themes ON sets.theme_id = themes.id
INNER JOIN inventories ON sets.set_num = inventories.set_num
INNER JOIN inventory_parts ON inventories.id = inventory_parts.inventory_id
INNER JOIN colors ON inventory_parts.color_id = colors.id
INNER JOIN parts ON inventory_parts.part_num = parts.part_num
WHERE colors.name = 'Fabuland Lime';")
# head(data_lime_bricks, n = 10)
data_lime_bricks
## sets_num sets_name color_name part_name
## 1 3715-1 Flower Stand Fabuland Lime Fabuland, Equipment Watering Can
## 2 3781-1 Maximillian Mouse Fabuland Lime Fabuland, Equipment Watering Can
## 3 3707-1 Clover Cow Fabuland Lime Fabuland, Equipment Watering Can
## 4 2770-1 Play House Fabuland Lime Duplo Bathroom Sink
## 5 2770-1 Play House Fabuland Lime Duplo Bathtub
## 6 2770-1 Play House Fabuland Lime Duplo Shower Head on Stand
## 7 2770-1 Play House Fabuland Lime Duplo Mirror with Silver Print
## 8 2770-1 Play House Fabuland Lime Duplo Toilet (without Rim)
## 9 2754-1 Bathroom Fabuland Lime Duplo Chair 2 x 2 x 2 with One Stud
## 10 2754-1 Bathroom Fabuland Lime Duplo Cabinet 2 x 2 x 1.5
## 11 2754-1 Bathroom Fabuland Lime Duplo Bathroom Sink
## 12 2754-1 Bathroom Fabuland Lime Duplo Bathtub
## 13 2754-1 Bathroom Fabuland Lime Duplo Shower Head on Stand
## 14 2754-1 Bathroom Fabuland Lime Duplo Mirror with Silver Print
## 15 2754-1 Bathroom Fabuland Lime Duplo Toilet (without Rim)
So, to answer the proposed question: In the above it can be seen that the ‘Fabuland Lime’ was used to make 8 different bricks. 7 bricks made up parts of a duplo bathroom from 2 sets and the final brick was a watering can used across 3 sets.
Image of Lego set with Fabuland Lime bricks.
It is known (see here) that in the early 2000’s Lego as a business was struggling, this struggle peaked with a reported 1.9 billion kr deficit in their 2004 annual report. However, the business is held as an exempla of a business turn around of with a reported 13.8 kr billion profit in 2022.
The turn around from 2005 is accredited to focusing on ‘the bricks’ (what Lego does well) which is still a highly rated child’s toy.
So, how are these times of turmoil reflected in the rebrickable data base?
The following plots look at the introduction, retirement and availability of themes and sets over time.
# Get data relating to themes and sets over time
data_themes <- dbGetQuery(conn,
"SELECT themes.name as 'themes_name',
MIN(sets.year) as 'min_year',
MAX(sets.year) as 'max_year',
COUNT(sets.name) as 'num_of_sets'
FROM sets
INNER JOIN themes ON sets.theme_id = themes.id
GROUP BY themes_name
ORDER BY themes_name;")
# Number of sets per year the theme existed
data_themes$sets_per_year <- data_themes$num_of_sets / (1 + data_themes$max_year - data_themes$min_year)
# Plot total data per year
data_themes_by_year <- data.frame(
year = seq(min(data_themes$min_year, na.rm = TRUE),
2024, 1) # Limit years to 2024
)
for (i in 1:nrow(data_themes)){
name_label = data_themes$themes_name[i]
data_themes_by_year[[name_label]] <- as.numeric(
data_themes_by_year$year >= data_themes$min_year[i] &
data_themes_by_year$year <= data_themes$max_year[i])
}
data_themes_by_year <- melt(data = data_themes_by_year,
id.vars = "year",
variable.name = "name",
value.name = "present")
data_themes_by_year$present[is.na(data_themes_by_year$present)] <- 0
ggplot(data = data_themes_by_year,
mapping = aes(x = year, y = present, fill = name)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Lego brick themes: Theme count by year",
subtitle = "",
x = "Year sets were introduced from a theme",
y = "Theme count") +
scale_fill_grey()
Number of themes vs the year sets were available
The above shows that the number of themes has increased from 1949 to a peak around 2015/2016 (both with 133 themes). This has then dropped to 112 themes by 2024.
Prior to 2004 (year of the largest reported business deficit) there was a increase rate in theme introduction starting in 1990 and plateauing in early 2000’s. Any correlation of this increased rate to the 2004 deficit is unknown, however it is reported that Lego started to license the rights to popular entertainment franchises (like Star Wars in 1999, various Disney in 2000, Harry Potter in 2001) with there being 8 & 13 new licenses in 2015 & 2016 respectively which align with the peak number of themes.
# Original plots - changed to display difference plot below
ggplot(data = data_themes[data_themes$min_year < 2025 & data_themes$max_year < 2025, ],
mapping = aes(x = min_year, fill = themes_name)) +
geom_bar(stat = "count", position = "stack") +
labs(title = "Lego brick themes: Theme's introduced by year",
subtitle = "",
x = "Year",
y = "Count of theme's introduced") +
scale_x_continuous(limits = c(1949, 2026), breaks = seq(1950, 2025, 10)) +
scale_y_continuous(limits = c(0, 20)) +
scale_fill_grey()
Plot showing the number of themes introduced between 1949 and 2024.
ggplot(data = data_themes[data_themes$min_year < 2025 & data_themes$max_year < 2025, ],
mapping = aes(x = max_year, fill = themes_name)) +
geom_bar(stat = "count", position = "stack") +
labs(title = "Lego brick themes: Theme's retired by year",
subtitle = "",
x = "Year",
y = "Count of theme's retired") +
scale_x_continuous(limits = c(1949, 2026), breaks = seq(1950, 2025, 10)) +
scale_y_continuous(limits = c(0, 20)) +
scale_fill_grey()
Plot showing the number of themes retired between 1949 and 2024.
# Updated plots of themes introduced and retired per year
data_themes_totalPerYear <- data.frame(
year = seq(min(data_themes$min_year, na.rm = TRUE),
2024, 1) # Limit years to 2024
)
for (i in 1:nrow(data_themes_totalPerYear)){
data_themes_totalPerYear$intro[i] <- sum(data_themes$min_year == data_themes_totalPerYear$year[i])
data_themes_totalPerYear$retir[i] <- sum(data_themes$max_year == data_themes_totalPerYear$year[i])
}
data_themes_totalPerYear$diff <- data_themes_totalPerYear$intro - data_themes_totalPerYear$retir
data_themes_totalPerYear <- melt(data = data_themes_totalPerYear,
id.vars = "year",
variable.name = "var",
value.name = "value")
plt_label <- c("intro" = "Themes introduced", "retir" = "Themes retired", "diff" = "Difference")
ggplot(data = data_themes_totalPerYear, mapping = aes(x = year, y = value, colour = var)) +
geom_line() +
facet_wrap(~ var, ncol = 1, labeller = as_labeller(plt_label)) +
labs(title = "Lego brick themes",
subtitle = "Number of themes introduced, retired and difference per year",
x = "Year",
y = "Count of themes")
Plot showing the number of themes introduced, retired and difference between 1949 and 2024.
The above plots show that until 1990 there was a consistent increase in the number of themes with changes per year being less than 4 themes, the exception being 1970 where there were 10 themes introduced. However, in 1995 there was an increased activity in both themes introduced and retired. As noted above, this aligns with the introduction of licensing entertainment franchises starting in 1999 could explain the increased levels of theme introduction and retirement as franchises become and loose popularity.
ggplot(data = data_themes[data_themes$min_year < 2025 & data_themes$max_year < 2025, ],
mapping = aes(x = min_year, y = sets_per_year, fill = themes_name)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Lego brick theme: Sets introduced per year",
subtitle = "Average number of sets available, coloured by theme",
x = "Year that theme was introduced",
y = "Average number of sets",
caption = "") +
scale_x_continuous(limits = c(1949, 2026), breaks = seq(1950, 2025, 10)) +
scale_fill_grey()
The above shows the average number of sets available before 1995 was below 50 (mostly below 25). After 1995, there are then 2 peaks in available sets around 2000 and 2015 with peak values greater than 130 sets and a dip in between these peaks in 2005 where less than 25 sets are available.
Like the theme trends discussed above, the dip in the number of available sets in 2005 could be a result of activities to address the record deficit that the Lego company experience at this point in time.
Also, looking back at the colour vs year plot (in the exploration of the data section above), the peak number of available colours was in 2004 and since 2007 the number of colours has stayed around 75. This reduction and constant number of colours could be the result of an efficiency exercise to streamline supply of plastics as part of the Lego business’ activities to address its deficit.